Reading recommendation
Since I am very methodical this notebook is well structured, very detailed and quite long. I warmly recommend reading this notebook with an interactive table of contents side-panel for convenient navigation. Jupyter with jupyter_contrib_nbextensions supports these and many more useful perks (I warmly recommend!), but they are unsupported in Git or a normal Jupyter, so -
Ways to view this notebook with an interactive navigation side-panel:
Author: oz.livneh@gmail.com
DatingAI is a personal research project that I started for my interest, challenge and experience, and also to demonstrate my skills (recently I'm looking for a job or interesting offers), aiming to:
The architecture presented in this notebook predicts the user score given for each profile based on the entire varying-length sequence of images appearing in the profile. This lifts the assumption in Image score regression that the images are independent when predicting for each image the score given to the profile to which the image belongs.
net_architecture options:'my simple CNN' - a (very) simple CNN I wrote mainly for debugging. 'inception v3','resnet18'.freeze_pretrained_net_weights options:True: weights of pretrained models are left untrained except for the last layer.False: weights of pretrained models are trained entirely starting from pretrained weights.In this demo I use Inception v3 with freeze_pretrained_net_weights=False.
images_batch:
[profile_images x channels x H x W]
-> image_model:
[profile_images x image_model_output_dim]
-> adding position information:
[profile_images x image_model_output_dim+1]
-> attention:
[profile_images x attention_dim]
-> reduction (according to self.attention_reduction):
[1 x attention_dim]
-> att_to_reg:
[1 x round(attention_dim/2)]
-> regression:
[1]
Batching (stacking of samples) is not trivial here.
Usually batching in PyTorch is only done in the dataloader level, not in the dataset level, as in Image score regression.
However, here each sample in the PyTorch dataset represents a profile and already contains a batch of profile images, a tensor in shape of [profile_images x channels x H x W]. Since each profile contains an arbitrary (non-negative integer) number of images, profiles with a different number of images cannot be stacked together in a tensor.
Therefore, I wrote a custom collate function for the dataloaders to collate image batches of different samples into a list. All images in a profile batch pass through the image net in parallel, the profile attention outputs are collected for all profiles in the batch, then they pass together through the regression.
import torch
print('torch\t\ttested on 1.3\t\tcurrent:',torch.__version__)
import sys
print('Python\t\ttested on 3.7.3\t\tcurrent:',sys.version[:sys.version.find(' ')])
import numpy as np
print('numpy\t\ttested on 1.16.4\tcurrent:',np.__version__)
import pandas as pd
print('pandas\t\ttested on 0.24.2\tcurrent:',pd.__version__)
import matplotlib
print('matplotlib\ttested on 3.1.0\t\tcurrent:',matplotlib.__version__)
#--------- general -------------
debugging=True # executes the debugging sections, prints their results
# debugging=False
torch_manual_seed=0 # integer or None for no seed; for torch reproducibility, as much as possible
#torch_manual_seed=None
#--------- data -------------
data_folder_path=r'D:\AI Data\DatingAI\Data'
images_folder_path=r'D:\AI Data\DatingAI\Data\Images'
df_pickle_file_name='profiles_df.pickle'
"""
dataset downsampling (samples = profiles, each containing a different
number of images):
if max_dataset_length>0: builds a dataset by sampling only
max_dataset_length samples from all available data.
requires user approval!
if max_dataset_length<=0: not restricting dataset length - using all
available data
"""
max_dataset_length=200 # if positive: builds a dataset by sampling only max_dataset_length samples from all available data; requires user approval
# max_dataset_length=0 # if non-positive: not restricting dataset length - using all available data
seed_for_dataset_downsampling=0 # integer or None for no seed; for sampling max_dataset_length samples from dataset
"""
random_transforms - data augmentation by applying random transforms
(random crop, horizontal flip, color jitter etc.) on the PyTorch dataset
"""
# random_transforms='train & val' # data augmentation on both train and val phases
# random_transforms='train' # data augmentation only on train phase, validation is free of random transforms
random_transforms='none' # no data augmentation
load_all_images_to_RAM=False # default; loads images from hard drive for each sample in the batch by the PyTorch efficient (multi-processing) dataloader
# load_all_images_to_RAM=True # loads all dataset images to RAM; estimates dataset size and requires user approval
validation_ratio=0.5 # validation dataset ratio from total dataset length
max_images_per_profile=4 # if >0: deleting profiles having more than max_images_per_profile images
batch_size_int_or_ratio_float=4 # if int: each batch will contain this number of samples (profiles)
#batch_size_int_or_ratio_float=1e-2 # if float: batch_size=round(batch_size_over_dataset_length*dataset_length)
data_workers=0 # 0 means no multiprocessing in dataloaders
# data_workers='cpu cores' # sets data_workers=multiprocessing.cpu_count()
shuffle_dataset_indices_for_split=True # dataset indices for dataloaders are shuffled before splitting to train and validation indices
#shuffle_dataset_indices_for_split=False
dataset_shuffle_random_seed=0 # numpy seed for sampling the indices for the dataset, before splitting to train and val dataloaders
#dataset_shuffle_random_seed=None
dataloader_shuffle=True # samples are shuffled inside each dataloader, on each epoch
#dataloader_shuffle=False
#--------- net -------------
# image_net_architecture='my simple CNN'
# image_net_is_a_pretrained_model=False
# image_net_architecture='resnet18'
# image_net_is_a_pretrained_model=True
image_net_architecture='inception v3'
image_net_is_a_pretrained_model=True
freeze_pretrained_net_weights=True # freezes pretrained model weights except the last layer
# freeze_pretrained_net_weights=False # trains pretrained models entirely, all weights, starting from pretrained values
image_model_output_dim=8
attention_dim=4 # should be an even number, since after the attention layer there is nn.Linear(attention_dim,round(attention_dim/2)) and then regression layer
# attention_reduction='mean'
attention_reduction='min'
loss_name='MSE'
#--------- training -------------
train_model_else_load_weights=True
#train_model_else_load_weights=False # instead of training, loads a pre-trained model and uses it
force_train_evaluation_after_each_epoch=True # adding evaluation of the training dataset after each epoch finishes training
# force_train_evaluation_after_each_epoch=False # default
epochs=20
# learning_rate=1e-3
# optimizer_name='SGD'
# SGD_momentum=0.7 # default: 0.9
learning_rate=5e-3
optimizer_name='Adam'
Adam_betas=(0.9,0.999) # default: (0.9,0.999)
lr_scheduler_decay_factor=0.99 # applies to all optimizers; on each lr_scheduler_step_size epochs, learning_rate*=lr_scheduler_decay_factor
lr_scheduler_step_size=1
best_model_criterion='min val epoch MSE' # criterion for choosing best net weights during training as the final weights
return_to_best_weights_in_the_end=True # when training complets, loads weights of the best net, definied by best_model_criterion
#return_to_best_weights_in_the_end=False
period_in_seconds_to_log_loss=30 # <=0 means no logging during training, else: inter-epoch logging and reporting loss and metrics during training
#plot_realtime_stats_on_logging=True # incomplete implementation!
plot_realtime_stats_on_logging=False
#plot_realtime_stats_after_each_epoch=True
plot_realtime_stats_after_each_epoch=False
#plot_loss_in_log_scale=True
plot_loss_in_log_scale=False
#offer_mode_saving=True # offer model weights saving ui after training (only if train_model_else_load_weights=True)
offer_mode_saving=False
models_folder_path='D:\My Documents\Dropbox\Python\DatingAI\Data\Saved Models'
import logging
logging.basicConfig(format='%(asctime)s %(funcName)s (%(levelname)s): %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
logger=logging.getLogger('main logger')
logger.setLevel(logging.INFO)
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import random
from collections import Counter
from time import time
import copy
import PIL
import multiprocessing
if data_workers=='cpu cores':
data_workers=multiprocessing.cpu_count()
import torch
torch.manual_seed(torch_manual_seed)
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import DatingAI
DatingAI.torch.manual_seed(torch_manual_seed)
device=DatingAI.device
# a charm for interactive plotting in Jupyter notebook (useful for zooming, rotating 3D plots):
%matplotlib notebook
# for development: reload modules before each time executing user code
%load_ext autoreload
%autoreload 2
profiles_df_path=os.path.join(data_folder_path,df_pickle_file_name)
profiles_df=pd.read_pickle(profiles_df_path)
logger.info('read profiles_df in length of %d'%(len(profiles_df)))
simplified_columns=[] # original profiles_df names, except for 'score (levels=%d)'%max_level column which is replaced in 'score'
for col in profiles_df.columns:
if 'score' in col:
simplified_columns.append('score')
else:
simplified_columns.append(col)
profiles_df.columns=simplified_columns
images_per_profile_list=[len(profiles_df['image filenames'].iloc[i]) \
for i in range(len(profiles_df))]
DatingAI.easy_hist(images_per_profile_list,distribution_is_continuous=False,
label='images per profile');
if max_images_per_profile>0:
profiles_df['images per profile']=images_per_profile_list
over_loaded_profiles=profiles_df['images per profile']>max_images_per_profile
over_loaded_profiles_number=over_loaded_profiles.sum()
if over_loaded_profiles_number:
over_loadeded_approval=input('%d profiles (%.1f%%) contain more than max_images_per_profile=%d images per profile, delete them? y/[n] '%(
over_loaded_profiles_number,
100*over_loaded_profiles_number/len(profiles_df),
max_images_per_profile))
if over_loadeded_approval=='y':
profiles_df=profiles_df[profiles_df['images per profile']<=max_images_per_profile]
logger.info('deleted over-loaded profiles')
if max_dataset_length>0 and max_dataset_length<len(profiles_df):
user_data_approval=input('ATTENTION: downsampling is chosen - building a dataset by sampling only max_dataset_length=%d samples from all available data! approve? y/[n] '%(round(max_dataset_length)))
if user_data_approval!='y':
raise RuntimeError('user did not approve dataset max_dataset_length sampling!')
random.seed(seed_for_dataset_downsampling)
sampled_indices=random.sample(range(len(profiles_df)),max_dataset_length)
profiles_df=profiles_df.iloc[sampled_indices]
logger.info('downsampled profiles_df')
profiles_df_image_filenames_list=[]
for profile_index in range(len(profiles_df)):
images_filenames=profiles_df.iloc[profile_index]['image filenames']
profiles_df_image_filenames_list.extend(images_filenames)
profiles_df_image_filenames_set=set(profiles_df_image_filenames_list)
logger.info('profiles_df is in length of %d, containing %d unique image filenames'%(
len(profiles_df),len(profiles_df_image_filenames_set)))
image_num_to_sample=5
# end of inputs ---------------------------------------------------------------
if debugging:
logger.info('checking image shapes of %d sampled images'%image_num_to_sample)
sampled_image_filenames=random.sample(profiles_df_image_filenames_list,image_num_to_sample)
for image_filename in sampled_image_filenames:
image_array=plt.imread(os.path.join(images_folder_path,image_filename))
print(f'{image_filename} shape:',image_array.shape)
In this section the transforms are defined (not all are random, some are required independently of data augmentation), then a dataset is built, tested, and only later training and validation datasets and dataloaders are built.
If load_all_images_to_RAM=True, the size of the dataset is estimated and then the user can choose to load all images to the RAM.
The non-trivial aspect of this dataset: each sample in this dataset represents a profile, and therefore returns a batch of profile images, a tensor in shape of [profile_images x channels x H x W]. Usually batching (stacking of samples) is only done in the dataloader level, not in the dataset level. Compare with Image score regression.
random_transforms set in # Setting main parameteres controls data augmentation:random_transforms='train & val': data augmentation on both train and val phases. The dataset that is created here is later split to training and validation datasets (and dataloaders).random_transforms='train': data augmentation only on train phase, validation is free of random transforms. The dataset that is created here with transforms, and later new separate train/val datasets (and dataloaders) are created, with/without random transforms.random_transforms='none': no data augmentation. The dataset that is created here is later split to training and validation datasets (and dataloaders).n_to_sample_for_data_size_estimation=10 # only if load_all_images_to_RAM=True was set
"""torchvision.transforms.ToTensor() Converts a PIL Image or numpy.ndarray (H x W x C) in the
range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the
range [0.0, 1.0] if the PIL Image belongs to one of the modes
(L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has
dtype = np.uint8
In the other cases, tensors are returned without scaling
source: https://pytorch.org/docs/stable/torchvision/transforms.html
"""
if image_net_is_a_pretrained_model:
if image_net_architecture=='inception v3':
input_size_for_pretrained=299
else:
input_size_for_pretrained=224
transform_func_with_random=torchvision.transforms.Compose([
torchvision.transforms.Resize(input_size_for_pretrained+10),
torchvision.transforms.RandomCrop(input_size_for_pretrained),
torchvision.transforms.ColorJitter(brightness=0.1,contrast=0.1,saturation=0,hue=0),
torchvision.transforms.RandomHorizontalFlip(p=0.5),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]), # required for pre-trained torchvision models!
])
transform_func_no_random=torchvision.transforms.Compose([
torchvision.transforms.Resize(input_size_for_pretrained),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]), # required for pre-trained torchvision models!
])
else:
transform_func_with_random=torchvision.transforms.Compose([
# torchvision.transforms.Resize(400),
torchvision.transforms.RandomCrop(390),
torchvision.transforms.ColorJitter(brightness=0.1,contrast=0.1,saturation=0,hue=0),
torchvision.transforms.RandomHorizontalFlip(p=0.5),
torchvision.transforms.ToTensor(),
])
transform_func_no_random=torchvision.transforms.Compose([
torchvision.transforms.Resize(390),
torchvision.transforms.ToTensor(),
])
# end of inputs ---------------------------------------------------------------
# if random_transforms=='none':
# random_transforms_ui=input('random_transforms=False was set, meaning no data augmentation, approve? [y]/n ')
# if random_transforms_ui=='n':
# raise RuntimeError('user did not approve no data augmentation, aborting')
if random_transforms=='none':
transform_func=transform_func_no_random
else:
transform_func=transform_func_with_random
dataset=DatingAI.profile_images_dataset(profiles_df=profiles_df,
images_folder_path=images_folder_path,transform_func=transform_func)
if load_all_images_to_RAM:
# estimating dataset size based on sampled samples(images)
sampled_image_filenames=random.sample(profiles_df_image_filenames_list,n_to_sample_for_data_size_estimation)
sampled_images_dict_in_RAM=DatingAI.build_images_dict_in_RAM(
image_filenames_list=sampled_image_filenames,
images_folder_path=images_folder_path)
image_np_arrays_size_MB=sum([sys.getsizeof(np.array(image)) for image in sampled_images_dict_in_RAM.values()])/1e6
expected_sampled_images_dict_in_RAM_size_MB=image_np_arrays_size_MB/n_to_sample_for_data_size_estimation*len(dataset)
user_decision_RAM=input('load_all_images_to_RAM=True was set, estimated dataset size based on %d random samples: %.1eMB, load all images to RAM? y/[n] '%(
n_to_sample_for_data_size_estimation,expected_sampled_images_dict_in_RAM_size_MB))
if user_decision_RAM=='y':
logger.info('started loading all images to RAM')
images_dict_in_RAM=DatingAI.build_images_dict_in_RAM(
image_filenames_list=profiles_df_image_filenames_list,
images_folder_path=images_folder_path)
dataset=DatingAI.profile_images_dataset(
profiles_df=profiles_df,
images_dict_for_RAM_loading=images_dict_in_RAM,
transform_func=transform_func)
image_np_arrays_size_MB=sum([sys.getsizeof(np.array(image)) for image in images_dict_in_RAM.values()])/1e6
logger.info('completed loading all images to RAM, size: %.1eMB'%image_np_arrays_size_MB)
else:
logger.info('user disapproved loading all dataset to RAM, keeping it on the hard drive and loading with a dataloader')
#sample_indices_to_plot=range(20) # for dataset plotting verification
random.seed(1)
sample_indices_to_plot=random.sample(range(len(profiles_df)),3)
images_per_row=5
figure_size=(6,2) # (width,height) in inches
normalize=True
# end of inputs ---------------------------------------------------------------
if debugging:
for sample_index in sample_indices_to_plot:
sample=dataset[sample_index]
profile_index=sample['profile index']
profile_score=sample['profile score']
DatingAI.plot_profile_images_dataset(images_batch=sample['images batch'],
figure_size=figure_size,images_per_row=images_per_row,
normalize=normalize)
plt.title(f"images batch of profile {profile_index} (score: {profile_score})")
print(f"profiles_df.loc[{profile_index}]['score']:",profiles_df.loc[profile_index]['score'])
# plt.title(f"images batch of profile {sample['profile index']}")
if random_transforms!='none':
logger.info("random_transforms!='none', plotting the same profiles twice to see random transforms")
for sample_index in sample_indices_to_plot:
sample=dataset[sample_index]
DatingAI.plot_profile_images_dataset(images_batch=sample['images batch'],
figure_size=figure_size,images_per_row=images_per_row,
normalize=normalize)
plt.title(f"images batch of profile {sample['profile index']} (score: {sample['profile score']})")
# plt.title(f"images batch of profile {sample['profile index']}")
target_is_continuous=False # target is not continuous, profile scores are discrete
normalization='over total' # heights=counts/sum(discrete_hist)
opacity=0.6
# end of inputs ---------------------------------------------------------------
# splitting
dataset_length=len(profiles_df)
dataset_indices=list(range(dataset_length))
split_index=int((1-validation_ratio)*dataset_length)
if shuffle_dataset_indices_for_split:
np.random.seed(dataset_shuffle_random_seed)
np.random.shuffle(dataset_indices)
train_indices=dataset_indices[:split_index]
val_indices=dataset_indices[split_index:]
dataset_indices={'train':train_indices,'val':val_indices}
logger.info('dataset indices split to training and validation, with validation_ratio=%.1f, lengths: (train,val)=(%d,%d)'%(
validation_ratio,len(train_indices),len(val_indices)))
# plotting target distibutions
plt.figure()
for phase in ['train','val']:
targets_list=profiles_df.iloc[dataset_indices[phase]]['score'].values
DatingAI.easy_hist(targets_list,distribution_is_continuous=target_is_continuous,
normalization=normalization,label=phase,opacity=opacity)
plt.title('training and validation target distributions')
plt.xlabel('target values')
plt.legend(loc='best');
This figure presents a few important things:
Since for each profile sample the dataset returns a batch of images in shape of [profile_images x channels x H x W], samples with a different number of images cannot be stacked together. Therefore, I wrote a custom collate function for the dataloaders to collate image batches of different samples into a list.
I also use my custom collate to stack 'image filename' and 'profile id' into lists instead of tensors - PyTorch raises an error if collating too long strings into tensors...
dataset_features_to_leave_unstacked=['images batch','image filenames','profile id']
# end of inputs ---------------------------------------------------------------
custom_collate_fn=DatingAI.collate_samples_into_batch_functional(dataset_features_to_leave_unstacked)
# setting batch size
if isinstance(batch_size_int_or_ratio_float,int):
batch_size=batch_size_int_or_ratio_float
elif isinstance(batch_size_int_or_ratio_float,float):
batch_size=round(batch_size_int_or_ratio_float*dataset_length)
else:
raise RuntimeError('unsupported batch_size input!')
if batch_size<1:
batch_size=1
logger.warning('batch_size=round(batch_size_over_dataset_length*dataset_length)<1 so batch_size=1 was set')
if batch_size==1:
user_batch_size=input('got batch_size=1, may cause errors, enter a new batch size equal or larger than 1, or smaller than 1 to abort: ')
if user_batch_size<1:
raise RuntimeError('aborted by user batch size decision')
else:
batch_size=round(user_batch_size)
# building datasets
if random_transforms=='train': # means applying random transforms only on train, so separate datasets must be created
if load_all_images_to_RAM and user_decision_RAM=='y':
train_dataset=DatingAI.profile_images_dataset(
profiles_df=profiles_df.iloc[train_indices],
images_dict_for_RAM_loading=images_dict_in_RAM,
transform_func=transform_func)
val_dataset=DatingAI.profile_images_dataset(
profiles_df=profiles_df.iloc[val_indices],
images_dict_for_RAM_loading=images_dict_in_RAM,
transform_func=transform_func)
else:
train_dataset=DatingAI.profile_images_dataset(
profiles_df=profiles_df.iloc[train_indices],
images_folder_path=images_folder_path,
transform_func=transform_func_with_random)
val_dataset=DatingAI.profile_images_dataset(
profiles_df=profiles_df.iloc[val_indices],
images_folder_path=images_folder_path,
transform_func=transform_func_no_random)
else:
dataset_to_split=dataset
# splitting the dataset to train and val
train_dataset=torch.utils.data.Subset(dataset_to_split,train_indices)
val_dataset=torch.utils.data.Subset(dataset_to_split,val_indices)
# building the train and val dataloaders
train_dataloader=torch.utils.data.DataLoader(train_dataset,batch_size=batch_size,
num_workers=data_workers,shuffle=dataloader_shuffle,
collate_fn=custom_collate_fn)
val_dataloader=torch.utils.data.DataLoader(val_dataset,batch_size=batch_size,
num_workers=data_workers,shuffle=dataloader_shuffle,
collate_fn=custom_collate_fn)
# structuring
datasets={'train':train_dataset,'val':val_dataset}
dataset_samples_number={'train':len(train_dataset),'val':len(val_dataset)}
dataloaders={'train':train_dataloader,'val':val_dataloader}
dataloader_batches_number={'train':len(train_dataloader),'val':len(val_dataloader)}
logger.info('dataset split to training and validation datasets and dataloaders with validation_ratio=%.1f, lengths: (train,val)=(%d,%d)'%(
validation_ratio,dataset_samples_number['train'],dataset_samples_number['val']))
images_per_row=5
figure_size=(6,2) # (width,height) in inches
normalize=True
# end of inputs ---------------------------------------------------------------
if debugging:
if __name__=='__main__' or data_workers==0:
for phase in ['train','val']:
batch=next(iter(dataloaders[phase]))
logger.info(f'{phase} batch contains {len(batch)} samples')
images_batches_list=batch['images batch']
for i_profile,images_batch in enumerate(images_batches_list):
images_batch=batch['images batch'][i_profile]
profile_index=batch['profile index'][i_profile].item()
profile_score=batch['profile score'][i_profile].item()
ax=DatingAI.plot_profile_images_dataset(images_batch=images_batch,
figure_size=figure_size,images_per_row=images_per_row,
normalize=normalize)
plt.title('%s dataloader, sample %d in batch (profile index: %d, score: %d)'%(
phase,i_profile,profile_index,profile_score))
print(f"profiles_df.loc[{profile_index}]['score']:",profiles_df.loc[profile_index]['score'])
else:
logger.warning('cannot use multiprocessing (data_workers>0 in dataloaders) in Windows when executed not as main')
if image_net_architecture=='inception v3':
image_model=torchvision.models.inception_v3(pretrained=True)
if freeze_pretrained_net_weights:
for param in image_model.parameters():
param.requires_grad=False
# Parameters of newly constructed modules have requires_grad=True by default:
image_model.AuxLogits.fc=nn.Linear(768,image_model_output_dim)
image_model.fc=nn.Linear(2048,image_model_output_dim)
elif image_net_architecture=='resnet18':
model=torchvision.models.resnet18(pretrained=True)
if freeze_pretrained_net_weights:
for param in image_model.parameters():
param.requires_grad=False
elif image_net_architecture=='my simple CNN':
class my_CNN(nn.Module):
def __init__(self):
super(my_CNN, self).__init__()
self.conv1 = nn.Conv2d(3,8,8,stride=4)
self.bn1=nn.BatchNorm2d(8)
self.pool = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(8,4,8,stride=4)
self.bn2=nn.BatchNorm2d(4)
self.fc1 = nn.Linear(100,24)
self.fc2 = nn.Linear(24,image_model_output_dim)
def forward(self,x):
x=F.relu(self.conv1(x))
x=self.bn1(x)
x=self.pool(x)
x=F.relu(self.conv2(x))
x=self.bn2(x)
x=self.pool(x)
x=x.view(-1,np.array(x.shape[1:]).prod()) # don't use x.view(batch_size,-1), which fails for batches smaller than batch_size (at the end of the dataloader)
x=F.relu(self.fc1(x))
x=self.fc2(x)
return x
image_model=my_CNN()
else:
raise RuntimeError('unsupported image_net_architecture input!')
model=DatingAI.attention_on_images_net(image_net_architecture,image_model,
image_model_output_dim,attention_dim,attention_reduction)
if image_net_is_a_pretrained_model and freeze_pretrained_net_weights:
parameters_to_optimize=[]
for name,parameter in model.named_parameters():
if parameter.requires_grad:
parameters_to_optimize.append(parameter)
else:
parameters_to_optimize=model.parameters()
model=model.to(device)
total_weights_num=sum(p.numel() for p in model.parameters())
trainable_weights_num=sum(p.numel() for p in model.parameters() if p.requires_grad)
logger.info("set model on %s, trainable/total weigths: %d/%d"%(
device,trainable_weights_num,total_weights_num))
if loss_name=='MSE':
loss_fn=nn.MSELoss(reduction='mean').to(device)
else:
raise RuntimeError('untreated loss_name input')
if optimizer_name=='SGD':
optimizer=torch.optim.SGD(parameters_to_optimize,lr=learning_rate,momentum=SGD_momentum)
elif optimizer_name=='Adam':
optimizer=torch.optim.Adam(parameters_to_optimize,lr=learning_rate,betas=Adam_betas)
else:
raise RuntimeError('untreated optimizer_name input')
scheduler=torch.optim.lr_scheduler.StepLR(optimizer,
step_size=lr_scheduler_step_size,gamma=lr_scheduler_decay_factor)
if debugging:
if __name__=='__main__' or data_workers==0:
batch=next(iter(dataloaders[phase]))
images_batch=batch['images batch'][0].to(device)
print('images_batch.shape:',images_batch.shape)
model.train()
with torch.set_grad_enabled(False):
if image_net_architecture=='inception v3' and model.training:
if images_batch.shape[0]>1:
image_model_outputs,aux_outputs=model.image_model(images_batch)
else: # for a single image batch normalization cannot operate, see comment above
model.image_model.eval()
image_model_outputs=model.image_model(images_batch)
model.image_model.train()
else:
image_model_outputs=model.image_model(images_batch)
print('image_model_outputs.shape:',image_model_outputs.shape)
# adding position information:
positions_tensor=torch.tensor(range(images_batch.shape[0])).to(device).float().unsqueeze(1)
image_model_outputs_plus_positions=torch.cat([image_model_outputs,positions_tensor],dim=1)
print('image_model_outputs_plus_positions.shape:',image_model_outputs_plus_positions.shape)
# applying attention
attention_outputs=F.relu(model.attention(image_model_outputs_plus_positions))
print('attention_outputs.shape:',attention_outputs.shape)
# reduction
if attention_reduction=='mean':
attention_outputs_tot=attention_outputs.mean(dim=0,keepdim=True)
elif attention_reduction=='min':
attention_outputs_tot,_=attention_outputs.min(dim=0,keepdim=True)
print('attention_outputs_tot.shape:',attention_outputs_tot.shape)
# applying regression
x=F.relu(model.att_to_reg(attention_outputs_tot))
regression_output=model.regression(x)
print('regression_output.shape:',regression_output.shape)
else:
logger.warning('cannot use multiprocessing (data_workers>0 in dataloaders) in Windows when executed not as main')
if debugging:
if __name__=='__main__' or data_workers==0:
batch=next(iter(dataloaders['train']))
labels=batch['profile score'].to(device).unsqueeze(1).float()
print('labels:',labels)
print('labels.shape:',labels.shape)
images_batches_list=batch['images batch']
model.train()
with torch.set_grad_enabled(False):
output_tensors_list=[]
for images_batch in images_batches_list:
images_batch=images_batch.to(device)
print('images_batch.shape:',images_batch.shape)
output=model(images_batch)
output_tensors_list.append(output)
outputs=torch.cat(output_tensors_list,dim=0)
print('outputs.shape:',outputs.shape)
loss=loss_fn(outputs,labels)
print('loss:',loss)
MSE=((outputs.flatten()-labels)**2).sum()/len(labels)
sqrt_MSE=MSE**0.5
sqrt_MSE=sqrt_MSE.item()
print('batch sqrt(MSE):',sqrt_MSE)
else:
logger.warning('cannot use multiprocessing (data_workers>0 in dataloaders) in Windows when executed not as main')
def model_evaluation(model,dataloader,loss_fn):
model.eval() # set model to evaluate mode
epoch_loss=0.0 # must be a float
epoch_samples_number=0
label_arrays_list=[]
output_arrays_list=[]
for i_batch,batch in enumerate(dataloader):
labels=batch['profile score'].to(device).unsqueeze(1).float()
images_batches_list=batch['images batch']
# forward
with torch.set_grad_enabled(False): # if phase=='train' it tracks tensor history for grad calc
output_tensors_list=[]
for images_batch in images_batches_list:
images_batch=images_batch.to(device)
output=model(images_batch)
output_tensors_list.append(output)
outputs=torch.cat(output_tensors_list,dim=0)
loss=loss_fn(outputs,labels)
# accumulating
samples_number=len(labels)
epoch_samples_number+=samples_number
current_loss=loss.item()*samples_number # the loss is averaged across samples in each minibatch, so it is multiplied to return to a total
epoch_loss+=current_loss
label_arrays_list.append(labels.flatten().cpu().numpy())
output_arrays_list.append(outputs.flatten().cpu().numpy())
# post-processing
epoch_loss_per_sample=epoch_loss/dataset_samples_number[phase]
labels_array=np.concatenate(label_arrays_list)
outputs_array=np.concatenate(output_arrays_list)
return labels_array,outputs_array,epoch_loss_per_sample
# if force_train_evaluation_after_each_epoch:
# train_evaluation_ui=input('force_train_evaluation_after_each_epoch=True was set, it is inefficient (but useful for training analysis), approve? y/[n] ')
# if train_evaluation_ui!='y':
# raise RuntimeError('user did not approve force_train_evaluation_after_each_epoch=True, aborting!')
if plot_realtime_stats_on_logging or plot_realtime_stats_after_each_epoch:
logger.warning('plotting from inside the net loop is not working, should be debugged...')
if train_model_else_load_weights and (__name__=='__main__' or data_workers==0):
stats_dict={'train':{'running metrics':{},'epoch total running metrics':{},
'evaluation metrics':{}},
'val':{'evaluation metrics':{}}}
if torch.cuda.is_available():
logger.info('torch is using %s (%s)'%(device,torch.cuda.get_device_name(device=0)))
else:
logger.info('torch is using %s'%(device))
# model pre-training evaluation
logger.info('started model pre-training evaluation')
for phase in ['train','val']:
dataloader=dataloaders[phase]
labels_array,outputs_array,epoch_loss_per_sample=\
model_evaluation(model,dataloaders[phase],loss_fn)
errors_array=labels_array-outputs_array
epoch_MSE=(errors_array**2).mean()
logger.info('(pre-training, %s) loss per sample: %.3e, sqrt(MSE): sqrt(%.3e)=%.3e'%(
phase,epoch_loss_per_sample,epoch_MSE,epoch_MSE**0.5))
stats_dict[phase]['evaluation metrics'].update({0:
{'loss per sample':epoch_loss_per_sample,
'MSE':epoch_MSE}})
total_batches=epochs*(dataloader_batches_number['train']+dataloader_batches_number['val'])
period_already_logged=0
logger.info('started model training')
print('-'*10)
tic=time()
# model training
for epoch in range(epochs):
for phase in ['train','val']:
if phase == 'train':
model.train() # set model to training mode
else:
model.eval() # set model to evaluate mode
epoch_loss=0.0 # must be a float
epoch_squared_error=0.0
samples_processed_since_last_log=0
loss_since_last_log=0.0 # must be a float
squared_error_since_last_log=0.0
for i_batch,batch in enumerate(dataloaders[phase]):
labels=batch['profile score'].to(device).unsqueeze(1).float()
images_batches_list=batch['images batch']
optimizer.zero_grad() # zero the parameter gradients
# forward
with torch.set_grad_enabled(phase=='train'): # if phase=='train' it tracks tensor history for grad calc
output_tensors_list=[]
for images_batch in images_batches_list:
images_batch=images_batch.to(device)
output=model(images_batch)
output_tensors_list.append(output)
outputs=torch.cat(output_tensors_list,dim=0)
loss=loss_fn(outputs,labels)
if torch.isnan(loss):
raise RuntimeError('reached NaN loss - aborting training!')
# backward + optimize if training
if phase=='train':
loss.backward()
optimizer.step()
scheduler.step()
# accumulating stats
samples_number=len(labels)
samples_processed_since_last_log+=samples_number
current_loss=loss.item()*samples_number # the loss is averaged across samples in each minibatch, so it is multiplied to return to a total
epoch_loss+=current_loss
loss_since_last_log+=current_loss
with torch.set_grad_enabled(False):
batch_squared_error=((outputs-labels)**2).sum().item()
epoch_squared_error+=batch_squared_error
squared_error_since_last_log+=batch_squared_error
# logging running stats
if phase=='train' and period_in_seconds_to_log_loss>0:
passed_seconds=time()-tic
period=passed_seconds//period_in_seconds_to_log_loss
if period>period_already_logged:
period_already_logged=period
loss_since_last_log_per_sample=loss_since_last_log/samples_processed_since_last_log
MSE_since_last_log=squared_error_since_last_log/samples_processed_since_last_log
completed_batches=epoch*(dataloader_batches_number['train']+dataloader_batches_number['val'])+(i_batch+1)
completed_batches_progress=completed_batches/total_batches
logger.info('(epoch %d/%d, batch %d/%d, %s, running) loss per sample : %.3e, sqrt(MSE): sqrt(%.3e)=%.3e'%(
epoch+1,epochs,i_batch+1,dataloader_batches_number[phase],phase,
loss_since_last_log_per_sample,
MSE_since_last_log,MSE_since_last_log**0.5))
partial_epoch=epoch+completed_batches_progress
stats_dict[phase]['running metrics'].update({partial_epoch:
{'batch':i_batch+1,'loss per sample':loss_since_last_log_per_sample,
'MSE':MSE_since_last_log}})
loss_since_last_log=0.0 # must be a float
squared_error_since_last_log=0.0
samples_processed_since_last_log=0
# accumulating epoch stats
epoch_loss_per_sample=epoch_loss/dataset_samples_number[phase]
epoch_MSE=epoch_squared_error/dataset_samples_number[phase]
if phase=='train': # saving running stats
stats_dict[phase]['epoch total running metrics'].update({epoch+1:
{'loss per sample':epoch_loss_per_sample,
'MSE':epoch_MSE}})
if force_train_evaluation_after_each_epoch: # train dataloader evaluation
labels_array,outputs_array,epoch_loss_per_sample=\
model_evaluation(model,dataloaders[phase],loss_fn)
errors_array=labels_array-outputs_array
epoch_MSE=(errors_array**2).mean()
stats_dict[phase]['evaluation metrics'].update({epoch+1:
{'loss per sample':epoch_loss_per_sample,
'MSE':epoch_MSE}})
else: # val dataloader evaluation
stats_dict[phase]['evaluation metrics'].update({epoch+1:
{'loss per sample':epoch_loss_per_sample,
'MSE':epoch_MSE}})
if phase=='val': # updating best model results
if best_model_criterion=='min val epoch MSE':
best_criterion_current_value=epoch_MSE
if epoch==0:
best_criterion_best_value=best_criterion_current_value
best_model_wts=copy.deepcopy(model.state_dict())
best_epoch=epoch
else:
if best_criterion_current_value<best_criterion_best_value:
best_criterion_best_value=best_criterion_current_value
best_model_wts=copy.deepcopy(model.state_dict())
best_epoch=epoch
# logging evaluation stats
if phase=='val' or force_train_evaluation_after_each_epoch:
completed_epochs_progress=(epoch+1)/epochs
passed_seconds=time()-tic
expected_seconds=passed_seconds/completed_epochs_progress*(1-completed_epochs_progress)
expected_remainder_time=DatingAI.remainder_time(expected_seconds)
logger.info('(epoch %d/%d, %s, evaluation) epoch loss per sample: %.3e, epoch sqrt(MSE): sqrt(%.3e)=%.3e\n\tProgress: %.2f%%, ETA: %dh:%dm:%.0fs'%(
epoch+1,epochs,phase,
epoch_loss_per_sample,
epoch_MSE,epoch_MSE**0.5,
100*completed_epochs_progress,
expected_remainder_time.hours,
expected_remainder_time.remainder_minutes,
expected_remainder_time.remainder_seconds))
print('-'*10)
toc=time()
elapsed_sec=toc-tic
logger.info('finished training %d epochs in %dm:%.1fs'%(
epochs,elapsed_sec//60,elapsed_sec%60))
if return_to_best_weights_in_the_end:
model.load_state_dict(best_model_wts)
logger.info("loaded weights of best model according to '%s' criterion: best value %.3f achieved in epoch %d"%(
best_model_criterion,best_criterion_best_value,best_epoch+1))
else: # train_model_else_load_weights==False
model_name_ui=input('model weights file name to load: ')
model_weights_file_path=os.path.join(models_folder_path,model_name_ui)
if not os.path.isfile(model_weights_file_path):
raise RuntimeError('%model_weights_path does not exist!')
model_weights=torch.load(model_weights_file_path)
model.load_state_dict(model_weights)
logger.info('model weights from %s were loaded'%model_weights_file_path)
I started this project for my interest, challenge, experience. "Great" results are not any of the goals, and not expected, since:
# plot_running_stats=True
plot_running_stats=False
plot_epoch_total_running_stats=True
# plot_epoch_total_running_stats=False
plot_loss_in_log_scale=False
# plot_loss_in_log_scale=True
figure_size=(10,4) # (width,height) in inches
# end of inputs ---------------------------------------------------------------
logger.info("remember that even if loss_name='MSE', the loss may include regularization or auxiliary terms (as in inception v3) and therefore may not equal MSE!")
if not (plot_realtime_stats_on_logging or plot_realtime_stats_after_each_epoch):
fig=plt.figure(figsize=figure_size)
plt.suptitle('training stats')
loss_subplot=plt.subplot(1,2,1)
MSE_subplot=plt.subplot(1,2,2)
DatingAI.training_stats_plot(stats_dict,fig,loss_subplot,MSE_subplot,plot_loss_in_log_scale,
plot_running_stats,plot_epoch_total_running_stats)
Since return_to_best_weights_in_the_end=True and best_model_criterion='min val epoch MSE' were set, the weights of the minimal validation MSE achieved during training were set in the model after training completed. As logged (or shown in the training stats figure), the best criterion value was achieved in epoch 1. The model with those weights is evaluated here:
distribution_is_continuous=True # target is not continuous, profile scores are discrete
normalization='over total' # heights=counts/sum(discrete_hist)
opacity=0.6
bins=50
# end of inputs ---------------------------------------------------------------
logger.info('started model evaluation')
plt.figure()
for phase in ['train','val']:
dataloader=dataloaders[phase]
labels_array,outputs_array,epoch_loss_per_sample=\
model_evaluation(model,dataloaders[phase],loss_fn)
errors_array=labels_array-outputs_array
DatingAI.easy_hist(errors_array,distribution_is_continuous=distribution_is_continuous,
bins=bins,normalization=normalization,label=phase,opacity=opacity)
epoch_MSE=(errors_array**2).mean()
logger.info('(post-training, %s) loss per sample: %.3e, sqrt(MSE): sqrt(%.3e)=%.3e'%(
phase,epoch_loss_per_sample,epoch_MSE,epoch_MSE**0.5))
logger.info('completed model evaluation')
plt.title('training and validation error distributions')
plt.xlabel('errors (targets-predictions)')
plt.legend(loc='best');
if offer_mode_saving and train_model_else_load_weights:
try: os.mkdir(models_folder_path)
except FileExistsError: pass # if the folder exists already - do nothing
saving_decision=input('save model weights? [y]/n ')
if saving_decision!='n':
model_name_ui=input('name model weights file: ')
model_weights_file_path=os.path.join(models_folder_path,model_name_ui+'.ptweights')
if os.path.isfile(model_weights_file_path):
alternative_filename=input('%s already exists, give a different file name to save, the same file name to over-write, or hit enter to abort: '%model_weights_file_path)
if alternative_filename=='':
raise RuntimeError('aborted by user')
else:
model_weights_file_path=os.path.join(models_folder_path,alternative_filename+'.ptweights')
torch.save(model.state_dict(),model_weights_file_path)
logger.info('%s saved'%model_weights_file_path)
logger.info('script completed')